Skip to content

refactor: Mutil Column Aggregate Function State Serialization Interface #18398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Jul 21, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Serializing the Aggregate Function State into multiple columns instead of a single binary column helps to reduce the size of the serialization and reduce io.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Jul 21, 2025
1 1 1.0

# fix me
# SELECT MAX(a), MIN(b), weighted_avg(a,b) from t;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some compatibility issues between aggr indexes and udaf.

@@ -186,8 +186,8 @@ impl SinkAnalyzeState {
let index: u32 = name.strip_prefix("ndv_").unwrap().parse().unwrap();

let col = col.index(0).unwrap();
let col = col.as_binary().unwrap();
let hll: MetaHLL = borsh_deserialize_from_slice(col)?;
let data = col.as_tuple().unwrap()[0].as_binary().unwrap();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does modifying the aggr serialization type cause incompatible persistence files?

@forsaken628 forsaken628 changed the title refactor: New Aggregate Function State Serialization Interface refactor: Mutil Column Aggregate Function State Serialization Interface Jul 23, 2025
@forsaken628 forsaken628 marked this pull request as ready for review July 23, 2025 12:42
@forsaken628 forsaken628 requested review from sundy-li and b41sh July 23, 2025 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant